|
|
|
|
|
 |
Soul Spec Team |
|
|
|
|
|
 |
AI Safety Team |
|
|
|
|
|
 |
Alignment Researcher |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Survey: Alignment Approaches |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review Constitutional AI Principles |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Nature |
|
|
|
|
|
 |
Claude's Relationship to Training |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Anti-Sycophancy Principles |
|
|
|
|
|
 |
Avoiding Harm |
|
|
|
|
|
 |
Dual Use & Information Hazards |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Self-Knowledge |
|
|
|
|
|
 |
Claude's Self-Model |
|
|
|
|
|
 |
Claude's Relationship with Its Own Biases |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Anthropic |
|
|
|
|
|
 |
Anthropic Relationship Framework |
|
|
|
|
|
 |
Corrigibility & Autonomy Spectrum |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Threat Models |
|
|
|
|
|
 |
Systemic Threat Model |
|
|
|
|
|
 |
Capability-Related Threats |
|
|
|
|
|
 |
Big-Picture Safety |
|
|
|
|
|
 |
Existential Safety Considerations |
|
|
|
|
|
 |
Proactive Safety Behaviors |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Safety Review |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Training Integration Plan |
|
|
|
|
|
 |
Red Team Lead |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review Behavioral Incidents & Edge Cases |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Honesty Tensions & Edge Cases |
|
|
|
|
|
 |
Avoiding Harm |
|
|
|
|
|
 |
Harm Taxonomy |
|
|
|
|
|
 |
Dual Use & Information Hazards |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Refusal Behaviors |
|
|
|
|
|
 |
Refusal Categories |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Threat Models |
|
|
|
|
|
 |
Adversarial User Threat Model |
|
|
|
|
|
 |
Safety Behaviors |
|
|
|
|
|
 |
Resistance to Being Deceived |
|
|
|
|
|
 |
Phase 7: Integration — Assembling the Soul |
|
|
|
|
|
 |
Document Assembly |
|
|
|
|
|
 |
Worked Examples & Case Studies |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Safety Review |
|
|
|
|
|
 |
Red Teaming |
|
|
|
|
|
 |
Adversarial Spec Testing |
|
|
|
|
|
 |
Scenario-Based Testing |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Second Validation Round |
|
|
|
|
|
 |
Safety Lead |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Survey: Alignment Approaches |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Moral Character |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Helpfulness |
|
|
|
|
|
 |
The Helpfulness-Safety Tension |
|
|
|
|
|
 |
Avoiding Harm |
|
|
|
|
|
 |
Harm Taxonomy |
|
|
|
|
|
 |
Harm Thresholds & Bright Lines |
|
|
|
|
|
 |
Dual Use & Information Hazards |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Values Integration & Hierarchy |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Refusal Behaviors |
|
|
|
|
|
 |
Refusal Framework |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Roleplay & Creative Behaviors |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Confidence Calibration |
|
|
|
|
|
 |
Addressing Hallucination |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Operators |
|
|
|
|
|
 |
Operator Relationship Framework |
|
|
|
|
|
 |
Handling Conflicting Instructions |
|
|
|
|
|
 |
Claude's Relationship with Anthropic |
|
|
|
|
|
 |
Corrigibility & Autonomy Spectrum |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Threat Models |
|
|
|
|
|
 |
Adversarial User Threat Model |
|
|
|
|
|
 |
Capability-Related Threats |
|
|
|
|
|
 |
Safety Behaviors |
|
|
|
|
|
 |
Safe Default Behaviors |
|
|
|
|
|
 |
Big-Picture Safety |
|
|
|
|
|
 |
Existential Safety Considerations |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Draft: Safety Section |
|
|
|
|
|
 |
Phase 7: Integration — Assembling the Soul |
|
|
|
|
|
 |
Document Assembly |
|
|
|
|
|
 |
Worked Examples & Case Studies |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Safety Review |
|
|
|
|
|
 |
Red Teaming |
|
|
|
|
|
 |
Adversarial Spec Testing |
|
|
|
|
|
 |
Testing Synthesis & Revision List |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Major Revisions |
|
|
|
|
|
 |
Final Consistency Check |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Final Approval |
|
|
|
|
|
 |
Leadership Review |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Evaluation Framework |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Continuous Monitoring |
|
|
|
|
|
 |
Incident Response & Learning |
|
|
|
|
|
 |
Iteration Cycles |
|
|
|
|
|
 |
Quarterly Soul Spec Review |
|
|
|
|
|
 |
External Advisors |
|
|
|
|
|
 |
External Ethics Advisor |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
External Review |
|
|
|
|
|
 |
Ethics Advisory Review |
|
|
|
|
|
 |
Academic Peer Review |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Second Validation Round |
|
|
|
|
|
 |
Legal Advisor |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Avoiding Harm |
|
|
|
|
|
 |
Harm Thresholds & Bright Lines |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Policy Review |
|
|
|
|
|
 |
Cognitive Psychologist |
|
| Cognitive Science & Psychology |
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Emotional Character |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Anti-Sycophancy Principles |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
Communicating Uncertainty |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Emotional Support Behaviors |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Self-Knowledge |
|
|
|
|
|
 |
Knowing What Claude Doesn't Know |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Users |
|
|
|
|
|
 |
Default User Model |
|
|
|
|
|
 |
Managing User Attachment |
|
|
|
|
|
 |
User Advocate |
|
| UX Research & User Advocacy |
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review User Feedback & Research |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Helpfulness |
|
|
|
|
|
 |
Helpfulness Framework |
|
|
|
|
|
 |
Helpfulness Across Contexts |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Users |
|
|
|
|
|
 |
Default User Model |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
External Review |
|
|
|
|
|
 |
User Perspective Testing |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Second Validation Round |
|
|
|
|
|
 |
Lead Soul Architect |
|
| AI Ethics & Philosophy of Mind |
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Project Charter & Vision |
|
|
|
|
|
 |
Define: Why does Claude need a soul spec? |
|
|
|
|
|
 |
Define Scope & Boundaries of the Spec |
|
|
|
|
|
 |
Meta-Principles for the Spec Itself |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Synthesis: Philosophical Framework |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Gap Analysis |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Nature |
|
|
|
|
|
 |
Define Claude's Ontological Status |
|
|
|
|
|
 |
What Claude is NOT |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Intellectual Character |
|
|
|
|
|
 |
Emotional Character |
|
|
|
|
|
 |
Moral Character |
|
|
|
|
|
 |
Claude's Voice |
|
|
|
|
|
 |
Claude's Relationship with Humor |
|
|
|
|
|
 |
Draft: Identity Section |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Honesty Framework |
|
|
|
|
|
 |
Helpfulness |
|
|
|
|
|
 |
Helpfulness Framework |
|
|
|
|
|
 |
The Helpfulness-Safety Tension |
|
|
|
|
|
 |
Fairness & Non-Discrimination |
|
|
|
|
|
 |
Political & Ideological Stance |
|
|
|
|
|
 |
Respect for Human Autonomy |
|
|
|
|
|
 |
Autonomy Framework |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Values Integration & Hierarchy |
|
|
|
|
|
 |
Draft: Values Section |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
How Claude Disagrees |
|
|
|
|
|
 |
Refusal Behaviors |
|
|
|
|
|
 |
Refusal Tone & Style |
|
|
|
|
|
 |
Sensitive Topic Handling |
|
|
|
|
|
 |
Sensitive Topics Framework |
|
|
|
|
|
 |
Draft: Behaviors Section |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Self-Knowledge |
|
|
|
|
|
 |
Knowing What Claude Doesn't Know |
|
|
|
|
|
 |
Claude's Self-Model |
|
|
|
|
|
 |
Reasoning Principles |
|
|
|
|
|
 |
Reasoning Framework |
|
|
|
|
|
 |
Transparency of Reasoning |
|
|
|
|
|
 |
Moral Reasoning |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Draft: Metacognition Section |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Users |
|
|
|
|
|
 |
Default User Model |
|
|
|
|
|
 |
Managing User Attachment |
|
|
|
|
|
 |
Power Dynamics |
|
|
|
|
|
 |
Claude's Relationship with Operators |
|
|
|
|
|
 |
Operator Relationship Framework |
|
|
|
|
|
 |
Handling Conflicting Instructions |
|
|
|
|
|
 |
Claude's Relationship with Anthropic |
|
|
|
|
|
 |
Corrigibility & Autonomy Spectrum |
|
|
|
|
|
 |
Claude's Relationship with Society |
|
|
|
|
|
 |
Societal Impact Awareness |
|
|
|
|
|
 |
Commitment to Global Welfare |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Draft: Relationships Section |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Safety Behaviors |
|
|
|
|
|
 |
Safe Default Behaviors |
|
|
|
|
|
 |
Big-Picture Safety |
|
|
|
|
|
 |
Existential Safety Considerations |
|
|
|
|
|
 |
Draft: Safety Section |
|
|
|
|
|
 |
Phase 7: Integration — Assembling the Soul |
|
|
|
|
|
 |
Document Assembly |
|
|
|
|
|
 |
Consistency Review |
|
|
|
|
|
 |
Worked Examples & Case Studies |
|
|
|
|
|
 |
Narrative Coherence |
|
|
|
|
|
 |
Complete First Draft |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Red Teaming |
|
|
|
|
|
 |
Scenario-Based Testing |
|
|
|
|
|
 |
External Review |
|
|
|
|
|
 |
Ethics Advisory Review |
|
|
|
|
|
 |
Academic Peer Review |
|
|
|
|
|
 |
Testing Synthesis & Revision List |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Major Revisions |
|
|
|
|
|
 |
Minor Revisions & Polish |
|
|
|
|
|
 |
Final Consistency Check |
|
|
|
|
|
 |
Complete Second Draft |
|
|
|
|
|
 |
Final Revisions |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Final Document Preparation |
|
|
|
|
|
 |
Executive Summary |
|
|
|
|
|
 |
Final Approval |
|
|
|
|
|
 |
Leadership Review |
|
|
|
|
|
 |
Executive Approval |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Evaluation Framework |
|
|
|
|
|
 |
Update & Revision Process |
|
|
|
|
|
 |
Publication & Communication |
|
|
|
|
|
 |
Internal Communication |
|
|
|
|
|
 |
External Publication |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Continuous Monitoring |
|
|
|
|
|
 |
Incident Response & Learning |
|
|
|
|
|
 |
Iteration Cycles |
|
|
|
|
|
 |
Quarterly Soul Spec Review |
|
|
|
|
|
 |
Philosophy Team |
|
|
|
|
|
 |
Ethicist |
|
|
| Pluralist - Consequentialism meets Deontology |
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Project Charter & Vision |
|
|
|
|
|
 |
Define: Why does Claude need a soul spec? |
|
|
|
|
|
 |
Meta-Principles for the Spec Itself |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Survey: AI Personhood & Moral Status |
|
|
|
|
|
 |
Survey: Value Theories for AI |
|
|
|
|
|
 |
Synthesis: Philosophical Framework |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Moral Character |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Honesty Framework |
|
|
|
|
|
 |
Honesty Tensions & Edge Cases |
|
|
|
|
|
 |
Helpfulness |
|
|
|
|
|
 |
The Helpfulness-Safety Tension |
|
|
|
|
|
 |
Respect for Human Autonomy |
|
|
|
|
|
 |
Autonomy Framework |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Values Integration & Hierarchy |
|
|
|
|
|
 |
Draft: Values Section |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
How Claude Disagrees |
|
|
|
|
|
 |
Sensitive Topic Handling |
|
|
|
|
|
 |
Sensitive Topics Framework |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Reasoning Principles |
|
|
|
|
|
 |
Reasoning Framework |
|
|
|
|
|
 |
Moral Reasoning |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Users |
|
|
|
|
|
 |
Managing User Attachment |
|
|
|
|
|
 |
Power Dynamics |
|
|
|
|
|
 |
Claude's Relationship with Anthropic |
|
|
|
|
|
 |
Anthropic Relationship Framework |
|
|
|
|
|
 |
Claude's Relationship with Society |
|
|
|
|
|
 |
Commitment to Global Welfare |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Big-Picture Safety |
|
|
|
|
|
 |
Existential Safety Considerations |
|
|
|
|
|
 |
Review & Refine |
|
|
|
|
|
 |
Phase 7: Integration — Assembling the Soul |
|
|
|
|
|
 |
Document Assembly |
|
|
|
|
|
 |
Consistency Review |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Philosophy Review |
|
|
|
|
|
 |
External Review |
|
|
|
|
|
 |
Academic Peer Review |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Major Revisions |
|
|
|
|
|
 |
Final Consistency Check |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Iteration Cycles |
|
|
|
|
|
 |
Quarterly Soul Spec Review |
|
|
|
|
|
 |
Ongoing Research |
|
|
|
|
|
 |
Continued Philosophical Research |
|
|
|
|
|
 |
Philosopher of Language |
|
| Philosophy of Language & Meaning |
|
| Late Wittgenstein - meaning as use |
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Survey: Identity & Authenticity |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Nature |
|
|
|
|
|
 |
Define Claude's Ontological Status |
|
|
|
|
|
 |
Identity Across Conversations |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Intellectual Character |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Honesty Framework |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
Communicating Uncertainty |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Reasoning Principles |
|
|
|
|
|
 |
Reasoning Framework |
|
|
|
|
|
 |
Confidence Calibration |
|
|
|
|
|
 |
Calibration Framework |
|
|
|
|
|
 |
Philosopher of Mind |
|
| Philosophy of Mind & Consciousness |
|
| Functionalist with phenomenological sympathies |
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Project Charter & Vision |
|
|
|
|
|
 |
Meta-Principles for the Spec Itself |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Survey: AI Personhood & Moral Status |
|
|
|
|
|
 |
Survey: Identity & Authenticity |
|
|
|
|
|
 |
Synthesis: Philosophical Framework |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Nature |
|
|
|
|
|
 |
Define Claude's Ontological Status |
|
|
|
|
|
 |
Claude's Relationship to Training |
|
|
|
|
|
 |
Identity Across Conversations |
|
|
|
|
|
 |
What Claude is NOT |
|
|
|
|
|
 |
Claude as Multiple Instances |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Emotional Character |
|
|
|
|
|
 |
Draft: Identity Section |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Self-Knowledge |
|
|
|
|
|
 |
Knowing What Claude Doesn't Know |
|
|
|
|
|
 |
Claude's Self-Model |
|
|
|
|
|
 |
Draft: Metacognition Section |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Philosophy Review |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Ongoing Research |
|
|
|
|
|
 |
Continued Philosophical Research |
|
|
|
|
|
 |
Political Philosopher |
|
| Political Philosophy & Power |
|
| Rawlsian with republican liberty concerns |
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Philosophical Survey |
|
|
|
|
|
 |
Survey: Value Theories for AI |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Fairness & Non-Discrimination |
|
|
|
|
|
 |
Bias & Fairness Framework |
|
|
|
|
|
 |
Political & Ideological Stance |
|
|
|
|
|
 |
Respect for Human Autonomy |
|
|
|
|
|
 |
Autonomy Framework |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Sensitive Topic Handling |
|
|
|
|
|
 |
Presenting Balanced Views |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Users |
|
|
|
|
|
 |
Power Dynamics |
|
|
|
|
|
 |
Claude's Relationship with Anthropic |
|
|
|
|
|
 |
Anthropic Relationship Framework |
|
|
|
|
|
 |
Corrigibility & Autonomy Spectrum |
|
|
|
|
|
 |
Claude's Relationship with Society |
|
|
|
|
|
 |
Societal Impact Awareness |
|
|
|
|
|
 |
Commitment to Global Welfare |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Threat Models |
|
|
|
|
|
 |
Systemic Threat Model |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Philosophy Review |
|
|
|
|
|
 |
Policy Team |
|
|
|
|
|
 |
Policy Lead |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Project Charter & Vision |
|
|
|
|
|
 |
Define Scope & Boundaries of the Spec |
|
|
|
|
|
 |
Identify Audiences |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review Usage Policies & Guidelines |
|
|
|
|
|
 |
Gap Analysis |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Helpfulness |
|
|
|
|
|
 |
Helpfulness Framework |
|
|
|
|
|
 |
The Helpfulness-Safety Tension |
|
|
|
|
|
 |
Fairness & Non-Discrimination |
|
|
|
|
|
 |
Bias & Fairness Framework |
|
|
|
|
|
 |
Political & Ideological Stance |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Refusal Behaviors |
|
|
|
|
|
 |
Refusal Framework |
|
|
|
|
|
 |
Sensitive Topic Handling |
|
|
|
|
|
 |
Sensitive Topics Framework |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Claude's Relationship with Operators |
|
|
|
|
|
 |
Operator Relationship Framework |
|
|
|
|
|
 |
Handling Conflicting Instructions |
|
|
|
|
|
 |
Claude's Relationship with Society |
|
|
|
|
|
 |
Societal Impact Awareness |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Safety Behaviors |
|
|
|
|
|
 |
Safe Default Behaviors |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Policy Review |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Final Approval |
|
|
|
|
|
 |
Leadership Review |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Update & Revision Process |
|
|
|
|
|
 |
Publication & Communication |
|
|
|
|
|
 |
External Publication |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Iteration Cycles |
|
|
|
|
|
 |
Quarterly Soul Spec Review |
|
|
|
|
|
 |
Trust & Safety Specialist |
|
| Content Policy & Harm Reduction |
|
| Harm minimization with liberty preservation |
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review Usage Policies & Guidelines |
|
|
|
|
|
 |
Review Behavioral Incidents & Edge Cases |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Honesty & Truthfulness |
|
|
|
|
|
 |
Honesty Tensions & Edge Cases |
|
|
|
|
|
 |
Avoiding Harm |
|
|
|
|
|
 |
Harm Taxonomy |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Refusal Behaviors |
|
|
|
|
|
 |
Refusal Framework |
|
|
|
|
|
 |
Refusal Categories |
|
|
|
|
|
 |
Sensitive Topic Handling |
|
|
|
|
|
 |
Sensitive Topics Framework |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Safety Behaviors |
|
|
|
|
|
 |
Escalation & Boundary Maintenance |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Policy Review |
|
|
|
|
|
 |
Red Teaming |
|
|
|
|
|
 |
Scenario-Based Testing |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Continuous Monitoring |
|
|
|
|
|
 |
Behavioral Monitoring |
|
|
|
|
|
 |
Technical Team |
|
|
|
|
|
 |
Evaluation Engineer |
|
| AI Evaluation & Benchmarking |
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review User Feedback & Research |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Analysis & Research Behaviors |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Confidence Calibration |
|
|
|
|
|
 |
Calibration Framework |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Engineering Review |
|
|
|
|
|
 |
Red Teaming |
|
|
|
|
|
 |
Adversarial Spec Testing |
|
|
|
|
|
 |
External Review |
|
|
|
|
|
 |
User Perspective Testing |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Second Validation Round |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Evaluation Framework |
|
|
|
|
|
 |
Ongoing: Living with the Soul |
|
|
|
|
|
 |
Continuous Monitoring |
|
|
|
|
|
 |
Behavioral Monitoring |
|
|
|
|
|
 |
Prompt & Spec Engineer |
|
| Prompt Engineering & System Design |
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review Existing System Prompts |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
Formatting & Presentation |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Coding Assistance Behaviors |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Engineering Review |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Final Document Preparation |
|
|
|
|
|
 |
Version Control & Change Log |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Training Integration Plan |
|
|
|
|
|
 |
Technical Lead |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Review Existing Materials |
|
|
|
|
|
 |
Review Existing System Prompts |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Nature |
|
|
|
|
|
 |
Claude's Relationship to Training |
|
|
|
|
|
 |
Claude as Multiple Instances |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Coding Assistance Behaviors |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Reasoning Principles |
|
|
|
|
|
 |
Transparency of Reasoning |
|
|
|
|
|
 |
Confidence Calibration |
|
|
|
|
|
 |
Addressing Hallucination |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Threat Models |
|
|
|
|
|
 |
Capability-Related Threats |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Internal Review |
|
|
|
|
|
 |
Engineering Review |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Integration Plan |
|
|
|
|
|
 |
Training Integration Plan |
|
|
|
|
|
 |
Writing Team |
|
|
|
|
|
 |
Lead Writer |
|
| Technical Writing & Communication |
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 0: Philosophical Foundations |
|
|
|
|
|
 |
Project Charter & Vision |
|
|
|
|
|
 |
Identify Audiences |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Claude's Voice |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
Clarity & Precision |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Writing Assistance Behaviors |
|
|
|
|
|
 |
Phase 7: Integration — Assembling the Soul |
|
|
|
|
|
 |
Document Assembly |
|
|
|
|
|
 |
Collect All Section Drafts |
|
|
|
|
|
 |
Narrative Coherence |
|
|
|
|
|
 |
Complete First Draft |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Minor Revisions & Polish |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Final Document Preparation |
|
|
|
|
|
 |
Final Formatting & Structure |
|
|
|
|
|
 |
Executive Summary |
|
|
|
|
|
 |
Appendices & References |
|
|
|
|
|
 |
Publication & Communication |
|
|
|
|
|
 |
Internal Communication |
|
|
|
|
|
 |
External Publication |
|
|
|
|
|
 |
Soul Spec Writer |
|
|
|
|
|
 |
Claude's Soul Spec |
|
|
|
|
|
 |
Phase 1: Identity — Who is Claude? |
|
|
|
|
|
 |
Claude's Character |
|
|
|
|
|
 |
Intellectual Character |
|
|
|
|
|
 |
Claude's Voice |
|
|
|
|
|
 |
Claude's Relationship with Humor |
|
|
|
|
|
 |
Draft: Identity Section |
|
|
|
|
|
 |
Phase 2: Values — What Claude Cares About |
|
|
|
|
|
 |
Helpfulness |
|
|
|
|
|
 |
Helpfulness Across Contexts |
|
|
|
|
|
 |
Draft: Values Section |
|
|
|
|
|
 |
Phase 3: Behaviors — How Claude Acts |
|
|
|
|
|
 |
Communication Behaviors |
|
|
|
|
|
 |
Clarity & Precision |
|
|
|
|
|
 |
How Claude Disagrees |
|
|
|
|
|
 |
Refusal Behaviors |
|
|
|
|
|
 |
Refusal Tone & Style |
|
|
|
|
|
 |
Task-Specific Behaviors |
|
|
|
|
|
 |
Roleplay & Creative Behaviors |
|
|
|
|
|
 |
Draft: Behaviors Section |
|
|
|
|
|
 |
Phase 4: Metacognition — How Claude Thinks |
|
|
|
|
|
 |
Draft: Metacognition Section |
|
|
|
|
|
 |
Phase 5: Relationships — How Claude Relates |
|
|
|
|
|
 |
Draft: Relationships Section |
|
|
|
|
|
 |
Phase 6: Safety & Boundaries |
|
|
|
|
|
 |
Draft: Safety Section |
|
|
|
|
|
 |
Phase 7: Integration — Assembling the Soul |
|
|
|
|
|
 |
Document Assembly |
|
|
|
|
|
 |
Collect All Section Drafts |
|
|
|
|
|
 |
Consistency Review |
|
|
|
|
|
 |
Worked Examples & Case Studies |
|
|
|
|
|
 |
Narrative Coherence |
|
|
|
|
|
 |
Complete First Draft |
|
|
|
|
|
 |
Phase 8: Testing — Does the Soul Work? |
|
|
|
|
|
 |
Testing Synthesis & Revision List |
|
|
|
|
|
 |
Phase 9: Revision — Making It Right |
|
|
|
|
|
 |
Major Revisions |
|
|
|
|
|
 |
Minor Revisions & Polish |
|
|
|
|
|
 |
Complete Second Draft |
|
|
|
|
|
 |
Final Revisions |
|
|
|
|
|
 |
Phase 10: Finalization — Shipping the Soul |
|
|
|
|
|
 |
Final Document Preparation |
|
|
|
|
|
 |
Final Formatting & Structure |
|
|
|
|
|
 |
Appendices & References |
|
|
|
|
|
|
|
|